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02 5 Abstract 

6 We perform a Bayesian analysis on abundance data for ten species of North American duck, 

<N 

7 using the results to investigate the evidence in favour of biologically motivated hypotheses 
s about the causes and mechanisms of density dependence in these species. We explore the 

00 

9 capabilities of our methods to detect density dependent effects, both by simulation and through 

10 analyzes of real data. The effect of the prior choice on predictive accuracy is also examined. 

> 

II We conclude that our priors, which are motivated by considering the dynamics of the system 

12 of interest, offer clear advances over the priors used by previous authors for the duck data sets. 

13 We use this analysis as a motivating example to demonstrate the importance of careful 

14 parameter prior selection if we are to perform a balanced model selection procedure. We also 

1 5 present some simple guidelines that can be followed in a wide variety of modelling frameworks 

1 6 where vague parameter prior choice is not a viable option. These will produce parameter priors 

17 that not only greatly reduce bias in selecting certain models, but improve the predictive ability 
is of the resulting model-averaged predictor. 
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9 1 Introduction 



20 Density dependence within a species is usually the primary means of numerical self-regulation, the 

21 mechanism by which a species can maintain a steady population trajectory in an environment that 



22 produces unexpected events of both beneficial and harmful natures. Turchin ( 1995 1, in a synthesis 

23 of several other sources, states that density dependence is necessary for a regulated population. 

24 That is, a population without it is almost certain to be numerically unstable, with an undefined 

25 carrying capacity. 

26 It is important to discern the magnitude of density dependence a species exhibits, as 

27 well as the time lag over which it operates. Knowledge of a species' likely response to natural 

28 as well as synthetic shocks will assist in effective species management. Statistically this is a 

29 challenging problem which does not usually admit closed-form mathematical analysis. 

30 The debate over the relevance of density dependence has been at times acrimonious, as 



31 summarised in Turchin ( 1995 1. The quote from that paper which we take as our starting point on 



32 this issue is that available evidence "is entirely consistent with the universal applicability of the 



33 density dependence model." (Turchin 1995, p. 31). As such, we seek to make what statistical 



34 inferences we can about the magnitude and time period of such effects. 

35 There are several biological hypotheses as to the causes of density dependence, both 

36 in general and in the specific case of North American ducks, our motivating example. These have 

37 differing implications for the likely degree of density dependence to be expected in such species. 

38 We analyze ten species of duck, including both diving and dabbling ducks, between 

39 which there is reason to expect a distinction in density dependence profile. The hypothesis tested 

40 (and to an extent borne out) by Jamieson and Brooks ( 2004| ) was that diving ducks might, in 



41 response to a poor year (low habitat and/or food availability), delay breeding for a year. This 

42 would imply a delayed density dependence in diving ducks that would not be present in dabbling 

43 ducks. 
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44 In contrast, Sargeant et al. ( 1984) looked at red fox {vulpes vulpes) predation on both 

45 diving and dabbling ducks, and concluded that dabbling ducks are significantly more vulnerable 

46 to predation of this kind. The red fox is only one predator of ducks in North America, but it is 

47 one of the primary predators and, in common with many other duck predators, it is a generalist. 



48 A hypothesis of Bj0rnstad et al. ( 1995), tested in Viljugrein et al. (2005) suggests that this would 

49 induce more immediate density dependence in the affected species, since both ducks and eggs are 
so potential predatory targets. This would imply both first and second order density dependence in 

51 dabbling ducks; less so in diving ducks. 

52 It is apparent that there are hypotheses that produce differing predictions as to the nature 

53 of density dependence in these species. We aim to provide a thorough statistical analysis using 

54 historical count data provided by |US Fish an d Wildlife Service ( 2010| ). We will take a Bayesian 

55 standpoint when analyzing these data. This is not because a classical analysis is impossible, but 

56 rather because we believe that common sense can be translated into a meaningful, informative 

57 parameter prior. 



58 



Inference about the degree of density dependence under this framework is a Bayesian 



59 model selection problem. Link and Barker (2006) illustrate the principles and some of the issues 

en inherent to this class of problem. We demonstrate that choosing an informative prior (using simple 

61 rules which we will describe) is both necessary for a balanced model selection procedure, and 

62 improves the accuracy with which we can predict future population levels. 



63 



The outline for the paper is as follows. First we summarise a widely used model for 

64 density dependence in the following subsection. Then in section 2 we consider the problem of 

65 choosing a Bayesian prior to use in our analysis. Section 3 is a simulation study to exhibit the 
ee improvements we offer over previous approaches, before we analyze real data in section 4. We 
67 finish with a discussion of our results and lessons learned that can usefully be applied to a wider 
es class of problems than the specific case of density dependence in North American ducks. 
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1.1 An Autoregressive Model for Density Dependence 



70 We consider density dependence model from Dennis and Taper ( 1994 ) . Let x t be the log-population 



71 size in year t. The evolution of x t over time is governed by the stochastic update 

k 

x t = x t _ 1 + b + J2 b i eXt ~ l e t ~N(0,a 2 ). (1) 

i=i 

72 The parameters are interpreted as 

k : degree (maximum time lag) of density dependence, in years. 
b : uninhibited exponential growth rate 

73 

bi, k : density dependence effects at different time-lags 
a 2 : species (and unmodelled covariate) volatility 

74 The number of b parameters is k + 1, so k is a model order parameter. If k = 0, then 

75 this process simplifies to a random walk with drift. Also, if several different mechanisms induce a 

76 density dependence effect at the same time lag, then the appropriate component of b will in effect 

77 be a summary statistic measuring the sum of all effects at that time lag. 

78 We do not in general observe a true and accurate count of the species abundance. We 

79 observe data y t which will include noise which may vary in intensity from year to year. We assume 
so that this observation process is Gaussian, i.e. 

y t ~N(x t ,S 2 ) (2) 



si and we assume that St is known for each year t = 1, . . . ,T. The full model as specified by 
82 equations [T] and [2] is thus a state space formulation. 
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83 



2 Prior Selection 



84 To perform a full Bayesian analysis and fit of this model, we need to specify a prior for each 

85 parameter that is not directly specified by the model itself. 

se We give a uniform prior to k, over {0, . . . , 5}. We believe it is implausible that density 

87 dependent effects could operate on a longer timescale than this. In particular, the hypotheses that 

ss we wish to assess are only concerned with density dependence up to second order. Our prior 

89 gives no preference to one time lag over another in this range, so that we can assess the evidence 

90 provided by the data in favour of each model. This is similar to a Bayes Factor, which can be used 

91 to compare the fit of different models (Kass and Raftery, 1995). 



92 An immproper inverse gamma (0,0) prior is assigned to a 2 . This is mostly for reasons 

93 of Bayesian conjugacy — the rate of learning is high for this parameter and the prior shape makes 

94 little difference. 

95 The distribution of Xi-5 might not be specified by the model (depending on A; — if 

96 k = 2 for example, then we need to specify the distribution of X\ and x 2 , and the model gives us 

97 the distribution of £3:5). In order to have a consistent likelihood across all models, we consider 

98 the observed likelihood function p(yi : 5|xi :5 ) as a (density) function of x 1:5 and treat it as our prior. 

99 Naturally we do not count it twice, so it is removed from the likelihood, as well as those systemic 

100 terms relating to the evolution of xi :5 . Thus, for all models, the first model-driven term in the 

101 likelihood is p(x 6 \xi :5 , ex 2 , b). The final parameter that requires a prior is b, but there is a pitfall to 

102 be aware of before we make our choice. 



103 Lindley's Paradox 



104 
105 
106 



It has been known for some time (LindleyJ 1957 1 that choosing a vague (high-variance) prior for 
within-model parameters (except for parameters common to all of them, such as a 2 ) will bias the 
model selection routine in favour of simple models. This is discussed in depth in Link and Barker 
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107 (2006). In the limiting case where an improper flat prior is used, the posterior model probabilities 

me will always be degenerate in favour of the model with the fewest parameters. Lindley's paradox 

109 therefore implies that we cannot take a diffuse Normal prior for b, since this would lead to selecting 

no k = 0, even if the data produced a likelihood that was higher for other models (hence the paradox). 

in In light of this it is clear we must choose an informative prior, but the question arises as 

112 to how to choose an informative prior when one has, apparently, no information. We now show that 

113 an informative choice can be reached just by excluding certain pathological cases that we would 

114 not expect to arise in the biological systems in question. 



115 



2.1 Stability Considerations 



no The population evolution model is simple to simulate from. When one does so one notices that for 

117 certain parameter values, the population fluctuates wildly or grows very rapidly until the computer 

us suffers numerical overflow. However, for other values, the population reaches a stable threshold 

119 after a period of time (regardless of its starting value) and then does not move too far from this. 

120 We refer to this level as the carrying capacity, since it is the maximum level for which the expected 

121 population trajectory is not downwards. We would like to restrict our parameters to values that 

122 produce a (finite) carrying capacity (exempt from this is the null model k = 0, as it can never have 

123 a carrying capacity). We will demonstrate that a diffuse independent Normal prior does not always 

124 lead to the stable scenario, but there are other priors that do (at least much more often). 

Consider the deterministic analog of the model equation [T] with no measurement error, 
and suppose that we observe a string of k years where the population is at a constant level x\± = x. 
Then 

k 

Xk+i = x + b + ^ bje x . 



i=i 



125 If bo and J2i=i ^ ^ °f different sign (and k is at least 1), then we can solve for when Xk+i = x, 
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126 and we find that this corresponds to 

127 This exposes an inherent asymmetry in this model, that b and the sum of the other components of 

128 b need to be of different sign to produce stable populations. This is not captured in an independent 

129 Normal prior. In addition, it raises the problem of estimating the carrying capacity x*. We are 

130 constructing a prior, so "peeking" at the data should be avoided where possible. The approach 

131 we suggest is to center the observed data (on the log scale) so that the carrying capacity should 

132 correspond approximately to x* = 0. Thus if you have data on a well-established and stable 

133 species, you should center to the mean across all of the time series, whereas if you are analyzing 

134 a population that (say) only achieves a stable level in the last fifteen years of a 50-year study, 

135 then it should be centered so that the mean of the last fifteen years is zero on the log-scale. This is 

136 equivalent to multiplying the data so that the geometric mean over that time period is 1 . Optionally, 

137 the carrying capacity could be introduced as another parameter and given a prior, but that is not an 

138 approach we consider, because it is difficult to get an independent estimate that might inform such 

139 a prior. 

140 We have suggested centering data, however the model is not invariant to such a trans- 

141 formation. For large populations, a density dependent effect of a particular magnitude will require 

142 a smaller b than for smaller populations. This is because the i-th density dependent effect is equal 

143 to bie^- 1 . If we do not center the data, we must incorporate some measure of the overall magnitude 



of the data into the prior (as done in [Jamieson and Brooks (2004)). If we center, then we do not 



145 need to look at the data in order to inform our prior. If we take the carrying capacity to be x* = 0, 

146 then, rearranging |3j we get 

k 

&o = -J>. (4) 

i=i 

147 Thus b is perfectly negatively correlated with each of the other components of b. If we take an 
us independent Normal (0, of) prior for 6 1:fc then this suggests that the joint prior for 6 0: fc should be 
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149 the degenerate Normal 
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150 This is degenerate in the sense that the covariance matrix does not have full rank, and only those 

151 values of b for which [4] holds will have nonzero likelihood. In practice this only applies to the 

152 deterministic model, and a small amount h would be added to the variance of b to allow for 

153 mis-estimation of x*. This is because there will always be probabilistic drift towards the carrying 

154 capacity, and by allowing some additional variation in bo, we introduce the requisite additional 

155 flexibility into the model. The choice of h also dictates the prior under the null k = model, so 

156 a reasonable value might be obtained by considering the variance of symmetric Gaussian random 

157 walks over time. For example, a value of h = 0.04225 corresponds to a process that is as likely 

158 as not to at least halve or double in five years. In other words, if Z ~ #(0, 5 x 0.04225) then 

159 P(Z G [— log(2), log(2)]) = 1/2. This is the value we use in all of our priors which have h as a 

160 parameter. 

161 We now consider the effect of small perturbations about carrying capacity. We will see 

162 that this restricts even further the set of parameter values that yield a dynamical system we might 

163 expect to see in a natural population. 

164 Suppose that x\± = (0, ... ,0,5). Then we may be in one of several scenarios (equa- 

165 tion [3] is assumed to hold): 

166 (a) Yli=i bi is positive. In this case, regardless of the sign of 5, the population is unstable and will 

167 diverge from 0. Carrying capacity is undefined. 

168 (b) — 1 < Yli=i bt < 0. The population returns monotonically towards capacity. 
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169 (c) — 2 < Yli=i hi < — 1- The population oscillates around capacity, with decreasing magnitude. 



173 



(d) Yli=i k < —2. The population oscillates around zero, but usually with much greater magni- 
tude than above. If all of 6 1:fe are negative, then the oscillations will quickly reach a consistent 
(perhaps large) magnitude, but if any of b\± are positive, then the population is probabilisti- 
cally unbounded i.e. with probability 1, as t — v oo, x t — > oo. In the latter case, capacity is 
again undefined. 



175 Plots of simulated population trajectories for all four cases are given in figure [TJ We contend that 

176 the second of these is most likely to be characteristic of a natural population, but that perhaps some 

177 allowance might be made for the third. The first and fourth are considered unlikely to arise in the 

178 natural world. 

179 This means that had we chosen a prior of the form [5] then we would unintentionally be 
making a strong prior assumption about the model order. For example, if k — 1, then Yli=i h * s 
a iV(0, al) random variable, with a corresponding probability of lying in [—2, 0]. If k — 2, then 
J2i=i h has a iV(0, 2of ) distribution, with correspondingly reduced probability of lying in this 
interval. This could be thought of as a manifestation of Lindley's paradox. If for example a\ = 1, 
then the chance of being in the prior-plausible region under k = 1 would be 48%. Under k = 5, 
that chance shrinks to 31%. The difference is even more pronounced if o\ is higher. Thus, we 
would be accidentally favouring simple models. 

A logical refinement of [Hjis to keep the distribution of Ym=i b% constant, and to restrict 
to cases where —2 < Yli=i b% < 0. This is 



180 



182 



186 



187 



K.k ~ N 



{ ol + h 



o. 



al/k 








-1 \ 






\ 



al/k ) 



(6) 



/ 



10 



Prior Choice for Model Selection 



in 
o _ 



o 



o 

i 



o 

l 




10 



— r 

20 



1 

30 

Index 



40 



50 



CM 

X 



o 
o 



m 
o 



o 
o 



o 

l 




10 



20 



— r~ 

30 



— r~ 

40 



50 



Index 



CO 

x 



o 
o 



o 
o _ 



o 

I 




10 



— r 

20 



r~ 

30 
Index 



40 



50 



X 



in 
o 



o 
o 

in 
o 

i 

o 

7 

in 

T 




10 



20 



30 



40 



50 



Index 



Figure 1: Simulations from the autoregressive model, with b = (1/2, —1/2), (1, —1), (3/2, —3/2) 
and (5/2, —5/2). Note that the last of these is a stable exception to the usually unstable case 
b > 2. a = 0.05 for all of these, with the process driving the greatly increased variance for the 
last simulation. There is no measurement error, and we observe from t = 100 to t — 150 starting 
at (x 1 ,x 2 ) = (0,0). 



189 restricted to the aforementioned set. This is easy and quick to sample from by rejection sampling. 

190 This prior also has the attractive property that the marginal distribution of b is the same under all 

191 models except k = 0, so we are equally willing to entertain density dependence effects at different 
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192 time-lags and we have not unintentionally biased our model prior towards small k, since the prior 

193 probability of a model where carrying capacity is defined is the same for all k > 0. 

194 2.2 Shrinkage 

195 The principle of shrinkage derives from the classical problem of estimating the mean of a mul- 



197 



203 



196 tivariate Normal distribution, subject to assumptions about its variance. It can be shown (Stein 



1955[ ) that simply taking the sample mean is inadmissible, provided the dimension is at least three. 

198 In other words, a shrunk estimate will provide better (in terms of mean square error) predictions of 

199 future observations drawn from the same distribution. We use this idea to motivate an alternative 

200 choice of prior, which will have an artificially reduced variance. 

201 It must be noted that the improved predictive power shrinkage allows is at the cost of 

202 bias. Such bias-variance tradeoffs are common in model selection problems. 



3 Analysis of Simulated Data 



204 Before we look at observed abundance data, we analyze some simulations of populations which 

205 follow the specified dynamics. We have two simulated datasets with the parameters (1) k = 1, b = 

206 (0.5, —0.5) and (2) k = 2, b = (0.5, —0.1, —0.4). Both simulations share the parameters a = 

207 0.05 = St for each t. Both series have 501 years of data (this is considerably longer than the real 

208 survey, so we can see how much we can expect to learn about the model parameters in the future). 

209 We consider five prior choices for b: 

210 1. Independent Normal, variance 5 (primarily as an illustration of Lindley's Paradox). 

211 2. Independent Normal, variance 1 (a baseline for comparison). 

212 3. Multivariate Normal with covariance matrix from the modified version of[5j and of = 1, h = 
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213 



0.04225. 



216 



4. A shrinkage-inspired prior: Normal with covariance matrix based on [6j 
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and again a 2 b = l,h = 0.04225. 



5. As (4), but with smaller variance ascribed to later components of b: 



bo-.k ~ N 



( 



o. 



a 2 h + h 



-al * d 



-al * d al * d 
















al * d/2 



y -al * d/k 








-al * d/k 






al * d/k J 



J 



(7) 



(8) 



d is suitably defined so that the sum of variances of b\± is of - In fact under this restriction 



d 



Eli 1/3 



in equation [8] for k > 1. Notice that both priors (4) and (5) have the same total variance for 
b, as long as k > 0. This is deliberate, as discussed earlier. 



219 The choice between the last two priors largely depends on whether one considers the assumption 

220 that longer lags tend to be smaller in size to be suitable a priori. We will see that they do not 

221 provide substantially different estimates or predictions, but then we only consider simulations for 

222 low values of k. 
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223 



We use a Particle Learning method (Carvalho et al. , 2010) combined with Reversible 

224 Jump MCMC (Green} 1995[ ) to produce a sample from the posterior for each simulation.This pro- 

225 duces a weighted sample from the posterior distribution of models, parameters and hidden states. 

226 We are also able to chart the posterior as it evolves over time, as more data are added. 



3.1 Model Selection Results 



The evolution of the posterior for k in the k = 1 simulation is shown in figure [2j The results for the 



Evolution of Model Posterior with Time 



Evolution of Model Posterior with Time 
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(a) Independent prior, variance 5 



(b) Independent prior, variance 1 
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Evolution of Model Posterior with Time 
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(c) Correlated prior, variance 1 



(d) Shrinkage prior, variance 1/fc 



Figure 2: Evolution of the posterior model distribution over a long time span when k — 1, for four 
different prior choices (l)-(4). The fifth posterior is almost identical to the fourth. 



229 second simulation are not pictured; they are qualitatively similar (except that the majority of the 

230 posterior mass is on k = 2 instead of k — 1, after a similar period of time). This figure makes it 
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231 clear that while we can learn the value of k, we can only do so slowly and given that we only have 

232 fifty or so years of duck abundance data, we cannot expect a conclusive model selection posterior. 

233 This makes it particularly important that we choose a parameter prior that will not influence the 

234 model selection process, since the signal from the data is quite weak. 

235 Another point of interest is the posterior at t = 6, i.e. after only one residual is taken 

236 into account. This is the first point at which we have a model posterior, and we can see for the 

237 independent priors that this posterior is far from uniform. This is one quantification of the model 

238 selection bias induced by the independent priors. 

239 4 Analysis of Observed Data 

240 In total, eleven species are analyzed. Seven of these are dabbling ducks: Mallard (Anas platyrhyn- 

241 chos), American Wigeon (Anas americana), Gadwall (Anas strepera), Green-Winged Teal (Anas 

242 crecca), Blue- Winged Teal (Anas discors), Northern Shoveler (Anas clypeata) and Northern Pin- 

243 tail (Anas acuta). The remaining four are diving ducks, two of which are amalgamated: Redhead 

244 (Aythya americana), Canvasback (Aythya valisineria) and Greater and Lesser Scaup (Aythya mar- 

245 ila and Aythya affinis). 



The data, as supplied by |US Fish and Wildlife Service (2010), include both an estimated 



247 annual count and an estimate of the observation error. We treat the observation error as exact. The 

248 posterior model probabilities for each species, using the shrinkage prior (4), are summarised in 

249 tabled 

250 None of the posteriors are conclusive as to the order of density dependence. We expect 

251 this from the simulation study; even with data that we know follows a particular instance of the 

252 model, we can only expect perhaps a 60% posterior probability for that model after this length of 

253 time. It would be optimistic to expect the same level of agreement with real data. 
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Species 


k = 


k = 1 


k = 2 


k = 3 


k = 4 


k = 5 


Mallard 


0.1718 


0.1882 


0.208 


0.0783 


0.2399 


0.1138 


A.Wigeon 


0.0238 


0.4192 


0.2637 


0.1337 


0.0601 


0.0995 


Gadwall 


0.6805 


0.1664 


0.0553 


0.0191 


0.045 


0.0338 


G.W.Teal 


0.6816 


0.0982 


0.052 


0.053 


0.0588 


0.0563 


B.W.Teal 


0.4422 


0.3197 


0.1347 


0.0582 


0.0276 


0.0176 


N.Shoveler 


0.4906 


0.0756 


0.2494 


0.0942 


0.0421 


0.0481 


N.Pintail 


0.2733 


0.2319 


0.2707 


0.1057 


.0322 


0.0862 


Redhead 


0.3239 


0.0671 


0.2005 


0.1489 


0.1355 


0.124 


Canvasback 


0.0299 


0.5284 


0.192 


0.0939 


0.0922 


0.0636 


Scaup 


0.5764 


0.1454 


0.1349 


0.0684 


0.0418 


0.0331 



Figure 3: Posterior model probabiliites for each duck species, using a shrinkage prior. 
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4.1 Predictive Accuracy 



255 It is impossible to assess the quality of the k posterior, since we have nothing with which to 

256 compare it. We can however look at the ability of the posterior at a given time point to make 

257 predictions of future numbers. These can then be compared with our best guess of the truth for that 

258 year (which the predictions were made without knowledge of.) A simple quantity that measures 

259 predictive accuracy is the one step ahead Mean Square Error MSE(t) = E,(x t — x t ) 2 where x t is the 

260 prediction of x t from the particle set at time t — 1 and x t is the "smoothed" state estimated from the 

261 particle set at time T. We seek to minimize MSE. As a typical example of the relative performance 

262 of each prior, figure [4] shows the evolution of the MSE over time, using the iV(0, 5) prior as a 

263 baseline, for the American Wigeon data. After a certain time, the MSE becomes approximately 

264 equal for all priors. This shows that the data has overwhelmed the prior in terms of information. 

265 Before then, there is significant disparity in MSE for the different priors, and while the correlated 

266 prior offers a mild improvement over the independent one, the shrinkage priors clearly outperform 

267 the others for up to 30 years. 

268 The MSE can sometimes be slightly misleading, since predictions are correlated (as are 



269 Jthe quantities they are predicting). One measure to correct this is the Mahalanobis distance (Ma 



270 halanobis 1936[ ). This is based on taking a Gaussian approximation to the predictive distribution 
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Predictive Accuracy Relative to Normal(0,5) 
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Figure 4: Mean squared predictive error for different priors, scaled so that the Normal(0,5) prior is 
at 100%. American Wigeon data. 



271 and calculating the expected total squared error over the whole time series. It is given by 



D Jlf = (x-x) T S'- 1 (x-x). (9) 

272 The Mahalanobis distance is not a function of time, it measures performance from start to finish. 

273 A low Mahalanobis distance is indicative of good overall predictive accuracy. When we calculate 

274 the Mahalanobis distance under each choice of prior for each species, we obtain table [5j The 

275 story is broadly the same for all the species, as follows: The independent priors have much more 

276 predictive error than the correlated ones (the high- variance prior being worst). The shrinkage 

277 priors, as expected, offer improvements over all the others, however there is little difference in 

278 accuracy between the two types of shrinkage prior. 




17 



Prior Choice for Model Selection 



Species 


N(0,5) 


N(0,1) 


Corr. 


Shrink. 1 


Shrink.2 


Mallard 


20726 


5416 


1767 


1622 


1575 


A.Wigeon 


4803 


1616 


949 


818 


807 


Gadwall 


2884 


1498 


1302 


1044 


1029 


GW.Teal 


4266 


2171 


1717 


1326 


1396 


BW.Teal 


7553 


2286 


1088 


1035 


992 


N.Shoveler 


4384 


1789 


1268 


1122 


1130 


N.Pintail 


19561 


5228 


2367 


1798 


1763 


Redhead 


3361 


1636 


1117 


1005 


981 


Canvasback 


3118 


1073 


529 


479 


481 


Scaup 


16153 


3702 


1037 


972 


879 



Figure 5: Mahalanobis Error for different prior choices. 

279 4.2 Interpretation of Results 

280 We see that for most species, we cannot discount the possibility that k — 0. This can be interpreted 

281 in a few different ways. The simplest explanation (which is also the least likely to be true in the 

282 authors' opinion) is that the species do not show density dependent dynamics. It is also possible 

283 that the species are in fact far from carrying capacity, so that the density dependent effects are too 

284 small to be measured. In that case a hypothesis must be made as to what is keeping the species 

285 from reaching capacity, and that is beyond the scope of this study. It might be possible that the 

286 numerical nature of the density dependence cannot be projected onto this class of models. If we 

287 were to observe the species over a longer time period, or where it were closer to capacity, these 

288 differences would likely present themselves in the form of evidence for k > in the posterior. 

289 It is interesting to note that the Mallard (which is the only species for which the 

290 posterior-preferred model is greater than k — 2) is also the species with the highest population 

291 count. This is potentially indicative that intra-species competition is a major factor in dabbling 

292 ducks, as there is a strong negative correlation between total count and posterior probability that 

293 k — for dabbling ducks. This correlation also could be taken as evidence of the generalist preda- 

294 tor hypothesis, which would argue that changes in duck recruitment (i.e. changes to b ) would be 

295 met with immediate responses from the predator (so that in fact b might change from year to year, 

296 but in a way that is probabilistically equivalent to the k = model with the variance being added 
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297 to cr 2 instead). 

298 The picture is somewhat different for diving ducks. The aforementioned correlation be- 

299 tween raw numbers and apparent density dependence is not apparent here. Again, this is consistent 

300 n with the generalist predator hypothesis which, taken in conjunction with the reports from Sargeant 



301 



et al. ( 1984) about diving ducks being much less vulnerable to this kind of predation, would sug- 

302 gest a different density dependent structure from that of dabbling ducks. Even here though, there 

303 is still appreciable posterior probability that k = in two out of three cases. 



The hypothesis of Jamieson and Brooks (2004), that diving ducks were in general more 



305 density dependent than dabbling ducks, is not really borne out by this analysis. The authors of 

306 that paper used independent priors with different variance for each species. As one example, for 

307 the Blue Winged Teal, the authors had an independent prior variance of 3, and came to a posterior 

308 that was 73% in favour of k = 0, and almost all the rest of the mass was for k = 1. We have 

309 demonstrated that this is largely an artifact of Lindley's Paradox and our posterior is much less 

310 conclusive. 



31 5 Discussion 

312 We hope that we have demonstrated the importance of a considered choice of prior. A default 

313 choice is rarely safe in model selection problems, and we have shown how, by considering whether 

314 the carrying capacity is well-defined and trying to exclude cases where it isn't, we can arrive at an 

315 informative prior without peeking at the data. 

316 A more general principle is that of excluding so-called 'unphysical' possibilities from 

317 the prior, that is, not allowing parameters to take values which would produce behaviour we know 

318 does not happen. We excluded models which did not give rise to a well-defined carrying capacity; 

319 the precise nature of the prior restrictions will vary from problem to problem. 

320 It is important to consider how a parameter's prior varies between models: a parameter 



19 



Prior Choice for Model Selection 



321 with a different interpretation in different models may well require a different prior in each case. 

322 In our example b typically had a prior that was different under the null model k = than in more 

323 complex cases. This mirrored the fact that in the null model b was interpreted as an overall drift, 

324 whereas otherwise it was the counterbalance to the density dependence effects. 

325 When we excercise such caution in choosing our parameter priors, we are in a position 

326 to judge much more effectively whether the data provide evidence in favour of our hypotheses or 

327 not. 
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